g
g
mulated genes and with varying numbers of DEGs from 2,000 to
can be seen that both sets of parameters converge quickly.
lgorithm is iterated till the convergence of model parameters or
the maximum learning cycle. Afterwards, the null density and the
e density are estimated for each gene. The Bayes rule is used to
e whether a gene is a DEG.
SG for simulated data DEG discovery
ted data set of 900 non-DEGs and 100 DEGs was designed [Al
2015]. Only one replicate was used. The design was composed of
gments. All non-DEGs were designed to follow a Gaussian
on centred at zero with a unit standard deviation. In total, 50
gulated DEGs were designed to follow a Gaussian distribution
t negative five with a standard deviation two. In addition, 50 up-
DEGs were designed to follow a Gaussian distribution centred
ith a standard deviation two. Figure 6.49(a) shows the estimated
or this simulated data set. It can be seen that the alternative density
y wide distribution compared with the null density. Figure 6.49(b)
e ROC curve for this DSG model. The AUC value was 0.996.
ext thing to be investigated was whether the alternative standard
caused a difference in DEG discovery using DSG. Rather than
two, the standard deviation for both down-regulated DEGs and
ated DEGs was varied from one to five. Therefore, the overlap
the null density of DEGs and the alternative density of non-DEGs
d in this trial. For each standard deviation, 50 DSG models were
ed and the Jackknife test was used to evaluate these 50 models.
n AUC was calculated for the evaluation. Thus, how model
nce (AUC) varied with the overlap between the non-DEG
on (the null density) and the DEG distribution (the alternative
was examined. Figure 6.50 shows the result. It can be seen that
l performance (AUC) of DSG was deteriorated slightly when the
etween the null density and the alternative density was increased.
ot a surprise at all.